The packages used in this tutorial can be installed with the following commands:
install.packages("sf")
install.packages("spData")Let’s start by loading the packages we will require for this section of the workshop:
library("tidyverse")
library("sf")
library("spData")As we have seen, the geographic vector model is based on points located within a coordinate reference system (CRS). Points can represent self-standing features (e.g., the location of a bus stop) or they can be linked together to form more complex geometries such as lines and polygons. Most point geometries contain only two dimensions (3-dimensional CRSs contain an additional z value, typically representing height above sea level).
Simple Features is a hierarchical data model that represents a wide range of geometry types. Of 17 geometry types supported by the specification, only 7 are used in the vast majority of geographic research; these core geometry types are fully supported by the R package sf.
Simple feature types fully supported by sf.
sf can represent all common vector geometry types (raster data classes are not supported by sf): points, lines, polygons and their respective ‘multi’ versions (which group together features of the same type into a single feature). sf also supports geometry collections, which can contain multiple geometry types in a single object. sf largely supersedes the sp ecosystem, which comprises sp [@R-sp], rgdal for data read/write [@R-rgdal] and rgeos for spatial operations [@R-rgeos]. The package is well documented, as can be seen on its website and in 6 vignettes, which can be loaded as follows:
vignette(package = "sf") # see which vignettes are available
vignette("sf1") # an introduction to the packageAs the first vignette explains, simple feature objects in R are stored in a data frame, with geographic data occupying a special column, usually named ‘geom’ or ‘geometry’.
We will use the world dataset. Let’s load it!
A wide selecton of spatial data formats can be read using the st_read command (it’s using GDAL/OGR behind the scenes). Unlike readOGR from the rgdal package, generally the command can guess which driver it should use.
world = st_read("data/world.shp")## Reading layer `world' from data source `C:\Users\grapacciuolo\Dropbox\teaching\workshops\R_workshops-Cal_Academy-Apr2019\Introduction_to_GIS_in_R\data\world.shp' using driver `ESRI Shapefile'
## Simple feature collection with 177 features and 10 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -90 xmax: 180 ymax: 83.64513
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
world is a spatial object provided by the spData (see nowosad.github.io/spData for a list of datasets loaded by the package) containing spatial and attribute columns, the names of which are returned by the function names() (the last column contains the geographic information):
names(world)## [1] "iso_a2" "name_long" "continent" "region_un" "subregion"
## [6] "type" "area_km2" "pop" "lifeExp" "gdpPercap"
## [11] "geometry"
The contents of this geom column give sf objects their spatial powers: world$geom is a ‘list column’ that contains all the coordinates of the country polygons.
The sf package provides a plot() method for visualizing geographic data: the following command creates:
plot(world)A spatial plot of the world using the sf package, with a facet for each attribute.
Note that instead of creating a single map, as most GIS programs would, the plot() command has created multiple maps, one for each variable in the world datasets. This behavior can be useful for exploring the spatial distribution of different variables.
Being able to treat spatial objects as regular data frames with spatial powers has many advantages, especially if you are already used to working with data frames.
The commonly used summary() function, for example, provides a useful overview of the variables within the world object.
summary(world["lifeExp"])## lifeExp geometry
## Min. :50.62 MULTIPOLYGON :177
## 1st Qu.:64.96 epsg:4326 : 0
## Median :72.87 +proj=long...: 0
## Mean :70.85
## 3rd Qu.:76.78
## Max. :83.59
## NA's :10
Although we have only selected one variable for the summary command, it also outputs a report on the geometry. This demonstrates the ‘sticky’ behavior of the geometry columns of sf objects, meaning that geometries are kept unless the user deliberately removes them. The result provides a quick summary of both the non-spatial and spatial data contained in world: the mean average life expectancy is 71 years (ranging from less than 51 to more than 83 years with a median of 73 years) across all countries.
MULTIPOLYGON in the summary output above refers to the geometry type of features (countries) in the world object. This representation is necessary for countries with islands such as Indonesia and Greece.
It is worth taking a deeper look at the basic behavior and contents of this simple feature object, which can usefully be thought of as a ‘spatial data frame’.
sf objects are easy to subset. The code below shows its first two rows and three columns. The output shows two major differences compared with a regular data.frame: the inclusion of additional geographic data (geometry type, dimension, bbox and CRS information - epsg (SRID), proj4string), and the presence of a geometry column, here named geom:
world_mini = world[1:2, 1:3]
world_mini## Simple feature collection with 2 features and 3 fields
## geometry type: MULTIPOLYGON
## dimension: XY
## bbox: xmin: -180 ymin: -18.28799 xmax: 180 ymax: -0.95
## epsg (SRID): 4326
## proj4string: +proj=longlat +datum=WGS84 +no_defs
## iso_a2 name_long continent geometry
## 1 FJ Fiji Oceania MULTIPOLYGON (((180 -16.067...
## 2 TZ Tanzania Africa MULTIPOLYGON (((33.90371 -0...
All this may seem rather complex, especially for a class system that is supposed to be simple. However, there are good reasons for organizing things this way and using sf.
Simple features is a widely supported data model that underlies data structures in many GIS applications including QGIS and PostGIS. A major advantage of this is that using the data model ensures your work is cross-transferable to other set-ups, for example importing from and exporting to spatial databases.
A more specific question from an R perspective is “why use the sf package when sp is already tried and tested”? There are many reasons (linked to the advantages of the simple features model) including:
%>% operator and works well with the tidyverse collection of R packages.st_).Due to such advantages, some spatial packages (including tmap, mapview and tidycensus) have added support for sf. However, it will take many years for most packages to transition and some will never switch. Fortunately, these can still be used in a workflow based on sf objects, by converting them to the Spatial class used in sp:
library(sp)
world_sp = as(world, Class = "Spatial")
# sp functions ...Spatial objects can be converted back to sf in the same way or with st_as_sf():
world_sf = st_as_sf(world_sp)Basic maps are created in sf with plot(). By default this creates a multi-panel plot (like sp’s spplot()), one sub-plot for each variable of the object.
A legend or ‘key’ with a continuous color is produced if the object to be plotted has a single variable (see the right-hand panel). Colors can also be set with col =, although this will not create a continuous palette or a legend.
plot(world[3:6])
plot(world["pop"])Plotting with sf, with multiple variables (left) and a single variable (right).
Plots are added as layers to existing images by setting add = TRUE. plot()ing of sf objects uses sf:::plot.sf() behind the scenes. plot() is a generic method that behaves differently depending on the class of object being plotted.
To demonstrate this, the subsequent code chunk combines countries in Asia:
world_asia = world[world$continent == "Asia", ]
asia = st_union(world_asia)We can now plot the Asian continent over a map of the world. Note that the first plot must only have one facet for add = TRUE to work. If the first plot has a key, reset = FALSE must be used (result not shown):
plot(world["pop"], reset = FALSE)
plot(asia, add = TRUE, col = "red")Adding layers in this way can be used to verify the geographic correspondence between layers: the plot() function is fast to execute and requires few lines of code, but does not create interactive maps with a wide range of options. For more advanced map making we recommend using dedicated visualization packages such as tmap, which we will introduce later in this workshop.
one of the advantages of sf is that sf objects behave like regular data frames, as illustrated by other sf-specific methods that were originally developed for data frames:
methods(class = "sf") # methods for sf objects, first 12 shown#> [1] aggregate cbind coerce
#> [4] initialize merge plot
#> [7] print rbind [
#> [10] [[<- $<- show Many of these functions, including rbind() (for binding rows of data together) and $= (for creating new columns) were developed for data frames.
sf objects also support tibble and tbl classes used in the tidyverse, allowing ‘tidy’ data analysis workflows for spatial data. Thus sf enables the full power of R’s data analysis capabilities to be unleashed on geographic data. Before using these capabilities it is worth re-capping how to discover the basic properties of vector data objects. Let’s start by using base R functions to get a measure of the world dataset:
dim(world) # it is a 2 dimensional object, with rows and columns## [1] 177 11
nrow(world) # how many rows?## [1] 177
ncol(world) # how many columns?## [1] 11
Let’s now load dplyr from the tidyverse package.
dplyr works well with the ‘pipe’ operator %>%. This enables expressive code: the output of a previous function becomes the first argument of the next function, enabling chaining. This is illustrated below, in which only countries from Asia are filtered from the world dataset, next the object is subset by columns (name_long and continent) and the first five rows (result not shown).
world_filtered = world %>%
dplyr::filter(continent == "Asia") %>%
dplyr::select(name_long, continent) %>%
dplyr::slice(1:5)We can also summarize sf objects using summarize() from the dplyr package.
world_aggregated = world %>%
dplyr::group_by(continent) %>%
dplyr::summarize(pop = sum(pop, na.rm = TRUE))Similarly, you can also join sf objects using left_join and other joins. The sf package enables taking advantage of the full power of the tidyverse!
This section provides an overview of spatial operations on vector geographic data represented as simple features in the sf package. Let’s generate 10 random points to illustrate some of the most useful spatial operations we may want to carry out.
set.seed(2018) # set seed for reproducibility
bb_world = st_bbox(world) # the world's bounds
print(bb_world)## xmin ymin xmax ymax
## -180.00000 -90.00000 180.00000 83.64513
random_df = tibble(
x = runif(n = 10, min = bb_world[1], max = bb_world[3]),
y = runif(n = 10, min = bb_world[2], max = bb_world[4])
)
random_points = random_df %>%
st_as_sf(coords = c("x", "y")) %>% # set coordinates
st_set_crs(4326) # set geographic CRSA simple query we may often want to run is: which of a set of points intersect with a given polygon? For instance, we might want to know if any of the points we generated fall within the continent of Africa. This is implemented in sf as follows:
africa = world %>% filter(continent == "Africa")
st_intersects(random_points, africa)## although coordinates are longitude/latitude, st_intersects assumes that they are planar
## Sparse geometry binary predicate list of length 10, where the predicate was `intersects'
## 1: (empty)
## 2: 43
## 3: (empty)
## 4: (empty)
## 5: (empty)
## 6: (empty)
## 7: (empty)
## 8: (empty)
## 9: (empty)
## 10: (empty)
The contents of the result should be as you expected: the function returns a positive (1) result for the first two points, and a negative result (represented by an empty vector) for the last two. What may be unexpected is that the result comes in the form of a list of vectors. This sparse matrix output only registers a relation if one exists, reducing the memory requirements of topological operations on multi-feature objects. As we saw in the previous section, a dense matrix consisting of TRUE or FALSE values for each combination of features can also be returned when sparse = FALSE:
st_intersects(random_points, africa, sparse = FALSE)## although coordinates are longitude/latitude, st_intersects assumes that they are planar
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [,12] [,13] [,14] [,15] [,16] [,17] [,18] [,19] [,20] [,21] [,22]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [,23] [,24] [,25] [,26] [,27] [,28] [,29] [,30] [,31] [,32] [,33]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [,34] [,35] [,36] [,37] [,38] [,39] [,40] [,41] [,42] [,43] [,44]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [,45] [,46] [,47] [,48] [,49] [,50] [,51]
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [7,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [8,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [9,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [10,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
The output is a matrix in which each row represents a feature in the target object and each column represents a feature in the selecting object. In this case, only the first two features in p intersect with a and there is only one feature in a so the result has only one column. The result can be used for subsetting as we saw in Section @ref(spatial-subsetting).
Note that st_intersects() returns TRUE for the second feature in the object p even though it just touches the polygon a: intersects is a ‘catch-all’ topological operation which identifies many types of spatial relation.
The opposite of st_intersects() is st_disjoint(), which returns only objects that do not spatially relate in any way to the selecting object (note [, 1] converts the result into a vector):
st_disjoint(random_points, africa, sparse = FALSE)[, 1]## although coordinates are longitude/latitude, st_intersects assumes that they are planar
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
st_within() returns TRUE only for objects that are completely within the selecting object. This applies only to the first object, which is inside the triangular polygon, as illustrated below:
st_within(random_points, africa, sparse = FALSE)[, 1]## although coordinates are longitude/latitude, st_within assumes that they are planar
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
Joining two non-spatial datasets relies on a shared ‘key’ variable, as described in Section @ref(vector-attribute-joining). Spatial data joining applies the same concept, but instead relies on shared areas of geographic space (it is also know as spatial overlay). As with attribute data, joining adds a new column to the target object (the argument x in joining functions), from a source object (y).
The process can be illustrated by an example. Of the points that are on land, which countries are they in?
The scenario is illustrated in the figure below. The random_points object (top left) has no attribute data, while the world (top right) does. The spatial join operation is done by st_join(), which adds the name_long variable to the points, resulting in random_joined which is illustrated below (bottom left — see 04-spatial-join.R). Before creating the joined dataset, we use spatial subsetting to create world_random, which contains only countries that contain random points, to verify the number of country names returned in the joined dataset should be four (see the top right panel below).
world_random = world[random_points, ]
nrow(world_random)## [1] 4
random_joined = st_join(random_points, world["name_long"])By default, st_join() performs a left join (see Section @ref(vector-attribute-joining)), but it can also do inner joins by setting the argument left = FALSE. Like spatial subsetting, the default topological operator used by st_join() is st_intersects(). This can be changed with the join argument (see ?st_join for details). In the example above, we have added features of a polygon layer to a point layer. In other cases, we might want to join point attributes to a polygon layer. There might be occasions where more than one point falls inside one polygon. In such a case st_join() duplicates the polygon feature: it creates a new row for each match.
Like attribute data aggregation, covered in Section @ref(vector-attribute-aggregation), spatial data aggregation can be a way of condensing data. Aggregated data show some statistics about a variable (typically average or total) in relation to some kind of grouping variable. Section @ref(vector-attribute-aggregation) demonstrated how aggregate() and group_by() %>% summarize() condense data based on attribute variables. This section demonstrates how the same functions work using spatial grouping variables.
Let’s have a look at the nz and nz_height datasets in spData. These contain projected data on the 16 main regions and 101 highest points in New Zealand, respectively. Imagine you want to find out the average height of high points in each region. This is a good example of spatial aggregation: it is the geometry of the source (y or nz in this case) that defines how values in the target object (x or nz_height) are grouped. This result can be generated using the ‘tidy’ functions group_by() and summarize() (used in combination with st_join()):
nz_avheight = nz %>%
st_join(nz_height) %>%
group_by(Name) %>%
summarize(elevation = mean(elevation, na.rm = TRUE))The resulting nz_avheight objects have the same geometry as the aggregating object nz but with a new column representing the mean average height of points within each region of New Zealand (other summary functions such as median() and sd() can be used in place of mean()). Note that regions containing no points have an associated elevation value of NA.
The distance between two objects is calculated with the st_distance() function. This is illustrated in the code chunk below, which finds the distance between the highest point in New Zealand and the geographic centroid of the Canterbury region:
nz_heighest = nz_height %>% top_n(n = 1, wt = elevation)
canterbury_centroid = nz %>%
filter(Name == "Canterbury") %>%
st_centroid()## Warning in st_centroid.sf(.): st_centroid assumes attributes are constant
## over geometries of x
st_distance(nz_heighest, canterbury_centroid)## Units: [m]
## [,1]
## [1,] 115540
There are two potentially surprising things about the result:
It has units, telling us the distance is 100,000 meters, not 100,000 inches, or any other measure of distance. It is returned as a matrix, even though the result only contains a single value. This second feature hints at another useful feature of st_distance(), its ability to return distance matrices between all combinations of features in objects x and y. This is illustrated in the command below, which finds the distances between the first three features in nz_height and the Otago and Canterbury regions of New Zealand represented by the object co.
co = nz %>% filter(grepl("Canter|Otag", Name))
st_distance(nz_height[1:3, ], co)## Units: [m]
## [,1] [,2]
## [1,] 123537.16 15497.72
## [2,] 94282.77 0.00
## [3,] 93018.56 0.00
Note that the distance between the second and third features in nz_height and the second feature in co is zero. This demonstrates the fact that distances between points and polygons refer to the distance to any part of the polygon
Geometric operations are more advanced than the spatial data operations presented above because they lead to some change in the geometry of vector (sf) objects, so we need to drill down into the geometry a little more and the class sfc underlying sf objects.
Centroid operations identify the center of geographic objects. Like statistical measures of central tendency (including mean and median definitions of ‘average’), there are many ways to define the geographic center of an object. All of them create single point representations of more complex vector objects.
The most commonly used centroid operation is the geographic centroid. This type of centroid operation (often referred to as ‘the centroid’) represents the center of mass in a spatial object (think of balancing a plate on your finger). Geographic centroids have many uses, for example to create a simple point representation of complex geometries, or to estimate distances between polygons. They can be calculated with the sf function st_centroid() as demonstrated in the code below, which generates the geographic centroids of regions in New Zealand and tributaries to the River Seine, illustrated with black points below:
nz_centroid = st_centroid(nz)
seine_centroid = st_centroid(seine)Buffers are polygons representing the area within a given distance of a geometric feature: regardless of whether the input is a point, line or polygon, the output is a polygon. Unlike simplification (which is often used for visualization and reducing file size) buffering tends to be used for geographic data analysis. How many points are within a given distance of this line? Which demographic groups are within travel distance of this new shop? These kinds of questions can be answered and visualized by creating buffers around the geographic entities of interest.
The figure below illustrates buffers of different sizes (5 and 50 km) surrounding the river Seine and tributaries. These buffers were created with commands below, which show that the command st_buffer() requires at least two arguments: an input geometry and a distance, provided in the units of the CRS (in this case meters):
seine_buff_5km = st_buffer(seine, dist = 5000)
seine_buff_50km = st_buffer(seine, dist = 50000)Spatial clipping is a form of spatial subsetting that involves changes to the geometry columns of at least some of the affected features.
Clipping can only apply to features more complex than points: lines, polygons and their ‘multi’ equivalents. To illustrate the concept we will start with a simple example: two overlapping circles with a center point one unit away from each other and a radius of one.
b = st_sfc(st_point(c(0, 1)), st_point(c(1, 1))) # create 2 points
b = st_buffer(b, dist = 1) # convert points to circles
plot(b)
text(x = c(-0.5, 1.5), y = 1, labels = c("x", "y")) # add textOverlapping circles.
Imagine you want to select not one circle or the other, but the space covered by both x and y. This can be done using the function st_intersection(), illustrated using objects named x and y which represent the left- and right-hand circles.
x = b[1]
y = b[2]
x_and_y = st_intersection(x, y)
plot(b)
plot(x_and_y, col = "lightgrey", add = TRUE) # color intersecting areaOverlapping circles with a gray color indicating intersection between them.
The subsequent code chunk demonstrates how this works for all combinations of the ‘Venn’ diagram representing x and y.
Spatial equivalents of logical operators.
As we saw in above, spatial aggregation can silently dissolve the geometries of touching polygons in the same group. This is demonstrated in the code chunk below in which 49 us_states are aggregated into 4 regions using base and tidyverse functions:
regions = us_states %>% group_by(REGION) %>%
summarize(pop = sum(total_pop_15, na.rm = TRUE))Spatial aggregation on contiguous polygons, illustrated by aggregating the population of US states into regions, with population represented by color. Note the operation automatically dissolves boundaries between states.
What is going on in terms of the geometries? Behind the scenes, summarize() combines the geometries and dissolve the boundaries between them using st_union(). This is demonstrated in the code chunk below which creates a united western US:
us_west = us_states[us_states$REGION == "West", ]
us_west_union = st_union(us_west)The function can take two geometries and unite them, as demonstrated in the code chunk below which creates a united western block incorporating Texas (challenge: reproduce and plot the result):
texas = us_states[us_states$NAME == "Texas", ]
texas_union = st_union(us_west_union, texas)